Goto

Collaborating Authors

 statistical framework


Task-Agnostic Machine-Learning-Assisted Inference

Neural Information Processing Systems

Machine learning (ML) is playing an increasingly important role in scientific research. In conjunction with classical statistical approaches, ML-assisted analytical strategies have shown great promise in accelerating research findings. This has also opened a whole field of methodological research focusing on integrative approaches that leverage both ML and statistics to tackle data science challenges. One type of study that has quickly gained popularity employs ML to predict unobserved outcomes in massive samples, and then uses predicted outcomes in downstream statistical inference. However, existing methods designed to ensure the validity of this type of post-prediction inference are limited to very basic tasks such as linear regression analysis.


A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

Neural Information Processing Systems

Fully quantized training (FQT), which uses low-bitwidth hardware by quantizing the activations, weights, and gradients of a neural network model, is a promising approach to accelerate the training of deep neural networks. One major challenge with FQT is the lack of theoretical understanding, in particular of how gradient quantization impacts convergence properties. In this paper, we address this problem by presenting a statistical framework for analyzing FQT algorithms. We view the quantized gradient of FQT as a stochastic estimator of its full precision counterpart, a procedure known as quantization-aware training (QAT). We show that the FQT gradient is an unbiased estimator of the QAT gradient, and we discuss the impact of gradient quantization on its variance. Inspired by these theoretical results, we develop two novel gradient quantizers, and we show that these have smaller variance than the existing per-tensor quantizer. For training ResNet-50 on ImageNet, our 5-bit block Householder quantizer achieves only 0.5% validation accuracy loss relative to QAT, comparable to the existing INT8 baseline.


Suitability Filter: A Statistical Framework for Classifier Evaluation in Real-World Deployment Settings

arXiv.org Machine Learning

Deploying machine learning models in safety-critical domains poses a key challenge: ensuring reliable model performance on downstream user data without access to ground truth labels for direct validation. We propose the suitability filter, a novel framework designed to detect performance deterioration by utilizing suitability signals -- model output features that are sensitive to covariate shifts and indicative of potential prediction errors. The suitability filter evaluates whether classifier accuracy on unlabeled user data shows significant degradation compared to the accuracy measured on the labeled test dataset. Specifically, it ensures that this degradation does not exceed a pre-specified margin, which represents the maximum acceptable drop in accuracy. To achieve reliable performance evaluation, we aggregate suitability signals for both test and user data and compare these empirical distributions using statistical hypothesis testing, thus providing insights into decision uncertainty. Our modular method adapts to various models and domains. Empirical evaluations across different classification tasks demonstrate that the suitability filter reliably detects performance deviations due to covariate shift. This enables proactive mitigation of potential failures in high-stakes applications.


Bias and Volatility: A Statistical Framework for Evaluating Large Language Model's Stereotypes and the Associated Generation Inconsistency

Neural Information Processing Systems

We present a novel statistical framework for analyzing stereotypes in large language models (LLMs) by systematically estimating the bias and variation in their generation. Current evaluation metrics in the alignment literature often overlook the randomness of stereotypes caused by the inconsistent generative behavior of LLMs. For example, this inconsistency can result in LLMs displaying contradictory stereotypes, including those related to gender or race, for identical professions across varied contexts. Neglecting such inconsistency could lead to misleading conclusions in alignment evaluations and hinder the accurate assessment of the risk of LLM applications perpetuating or amplifying social stereotypes and unfairness.This work proposes a Bias-Volatility Framework (BVF) that estimates the probability distribution function of LLM stereotypes. Specifically, since the stereotype distribution fully captures an LLM's generation variation, BVF enables the assessment of both the likelihood and extent to which its outputs are against vulnerable groups, thereby allowing for the quantification of the LLM's aggregated discrimination risk. Furthermore, we introduce a mathematical framework to decompose an LLM's aggregated discrimination risk into two components: bias risk and volatility risk, originating from the mean and variation of LLM's stereotype distribution, respectively.


Task-Agnostic Machine-Learning-Assisted Inference

Neural Information Processing Systems

Machine learning (ML) is playing an increasingly important role in scientific research. In conjunction with classical statistical approaches, ML-assisted analytical strategies have shown great promise in accelerating research findings. This has also opened a whole field of methodological research focusing on integrative approaches that leverage both ML and statistics to tackle data science challenges. One type of study that has quickly gained popularity employs ML to predict unobserved outcomes in massive samples, and then uses predicted outcomes in downstream statistical inference. However, existing methods designed to ensure the validity of this type of post-prediction inference are limited to very basic tasks such as linear regression analysis.


A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

Neural Information Processing Systems

Fully quantized training (FQT), which uses low-bitwidth hardware by quantizing the activations, weights, and gradients of a neural network model, is a promising approach to accelerate the training of deep neural networks. One major challenge with FQT is the lack of theoretical understanding, in particular of how gradient quantization impacts convergence properties. In this paper, we address this problem by presenting a statistical framework for analyzing FQT algorithms. We view the quantized gradient of FQT as a stochastic estimator of its full precision counterpart, a procedure known as quantization-aware training (QAT). We show that the FQT gradient is an unbiased estimator of the QAT gradient, and we discuss the impact of gradient quantization on its variance.


Review for NeurIPS paper: A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

Neural Information Processing Systems

Summary and Contributions: The authors analyze the effect of gradient quantization for quantized training in a principled fashion, and introduce two methods that reduce the variance of the gradients when doing quantized training. Still I hold that if FQT is compared to QAT, you should quantize the weights and not keep shadow weights. This is what I meant with having the actual weights quantized, and the updates quantized as well. In most FQT applications that are parallelized in compute, you are very often memory movement bound, meaning you're playing a game of reducing memory as much as possible. The gradients are calculated on the fly, used and discarded in the backward pass, the memory overhead of them is small.


Towards Foundation Models: Evaluation of Geoscience Artificial Intelligence with Uncertainty

arXiv.org Artificial Intelligence

Artificial intelligence (AI) has transformed the geoscience community with deep learning models (DLMs) that are trained to complete specific tasks within workflows. This success has led to the development of geoscience foundation models (FMs), which promise to accomplish multiple tasks within a workflow or replace the workflow altogether. However, lack of robust evaluation frameworks, even for traditional DLMs, leaves the geoscience community ill prepared for the inevitable adoption of FMs. We address this gap by designing an evaluation framework that jointly incorporates three crucial aspects to current DLMs and future FMs: performance uncertainty, learning efficiency, and overlapping training-test data splits. To target the three aspects, we meticulously construct the training, validation, and test splits using clustering methods tailored to geoscience data and enact an expansive training design to segregate performance uncertainty arising from stochastic training processes and random data sampling. The framework's ability to guard against misleading declarations of model superiority is demonstrated through evaluation of PhaseNet, a popular seismic phase picking DLM, under 3 training approaches. Furthermore, we show how the performance gains due to overlapping training-test data can lead to biased FM evaluation. Our framework helps practitioners choose the best model for their problem and set performance expectations by explicitly analyzing model performance at varying budgets of training data.


A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

Neural Information Processing Systems

Fully quantized training (FQT), which uses low-bitwidth hardware by quantizing the activations, weights, and gradients of a neural network model, is a promising approach to accelerate the training of deep neural networks. One major challenge with FQT is the lack of theoretical understanding, in particular of how gradient quantization impacts convergence properties. In this paper, we address this problem by presenting a statistical framework for analyzing FQT algorithms. We view the quantized gradient of FQT as a stochastic estimator of its full precision counterpart, a procedure known as quantization-aware training (QAT). We show that the FQT gradient is an unbiased estimator of the QAT gradient, and we discuss the impact of gradient quantization on its variance.


Statistical Framework for Clustering MU-MIMO Wireless via Second Order Statistics

arXiv.org Artificial Intelligence

This work explores the clustering of wireless users by examining the distances between their channel covariance matrices, which reside on the Riemannian manifold of positive definite matrices. Specifically, we consider an estimator of the Log-Euclidean distance between multiple sample covariance matrices (SCMs) consistent when the number of samples and the observation size grow unbounded at the same rate. Within the context of multi-user MIMO (MU-MIMO) wireless communication systems, we develop a statistical framework that allows to accurate predictions of the clustering algorithm's performance under realistic conditions. Specifically, we present a central limit theorem that establishes the asymptotic Gaussianity of the consistent estimator of the log-Euclidean distance computed over two sample covariance matrices.